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ABSTRACT 

Baryon Acoustic Oscillations (BAOs) are a feature imprinted in the galaxy distribution by acoustic 
waves traveling in the plasma of the early universe. Their detection at the expected scale in large-scale 
structures strongly supports current cosmological models with a nearly linear evolution from redshift 
z « 1000 and the existence of dark energy. Besides, BAOs provide a standard ruler for studying cosmic 
expansion. In this paper we focus on methods for BAO detection using the correlation function 
measurement £. For each method, we want to understand the tested hypothesis (the hypothesis 
Hq to be rejected) and the underlying assumptions. We first present wavelet methods which are 
mildly model-dependent and mostly sensitive to the BAO feature. Then we turn to fully modcl- 
dependent methods. We present the most often used method based on the % 2 statistic, but we find 
it has limitations. In general the assumptions of thc x 2 mcthod are not verified, and it only gives 
a rough estimate of the significance. The estimate can become very wrong when considering more 
rcalistic hypotheses, where the covariance matrix of £ depends on cosmological parameters. Instead 
we propose to use the Al method based on two modifications: we modify the procedure for computing 
the significance and make it rigorous, and we modify the statistic to obtain better results in the case of 
varying covariance matrix. We verify with simulations that correct significances are different from thc 
ones obtained using the classical x 2 procedure. We also test a simple example of varying covariance 
matrix. In this case we find that our modified statistic outperforms the classical \ 2 statistic when 
both significances are correctly computed. Finally we find that taking into account variations of the 
covariance matrix can change both BAO detection levels and cosmological parameter constraints. 
Subject headings: large-scale structure of Universe - distance scale - dark energy - cosmological pa- 
rameters 



1. INTRODUCTION 

Large-scale structures in the Universe provide crucial 
information and can be used to test different cosmological 
models. This study is complementary to other observa- 
tions such as the Cosmic Microwave Background (CMB) 
or Type Ia supernovae. Combining thosc diffcrcnt ob- 
servations enables to cross-test models, to break degen- 
eracies, and better constrain cosmological parameters. 
The good agreement found recently between them and 
the now-standard Lambda-Cold Dark Matter (ACDM) 
model gives hope for this model to be a lasting founda- 
tion. 

CDM models with baryons predict the existence of 
acoustic waves traveling in the hot plasma before re- 
combination, when baryons and photons were coupled 
together. Those spherical waves originate from the com- 
petition between gravitation making over-densities col- 
lapse and the photon pressure. About 380 000 years 
after the Big Bang, baryons and photons decoupled and 
those spherical waves became frozen at the sound hori- 
zon scale r s . Because of their large size (« 153 Mpc) 
their imprints in the matter density field havc mainly un- 
dertook linear evolution and they should be clearly seen 
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in current large-scale structures. Those acoustic waves 
which translate into an excess of correlation at the sound 
horizon scale are known as Baryon Acoustic Osc i llation s 



(BAOs, |Peebles k Yu| (|1970j; |Bassett k Hlozekj ( f2010b ). 

There are two ditterent uses of BAOs that should be 
distinguished. First they can be used as a very dis- 
tinct feature to confirm cosmological models. Indeed 
their detection at the expected scale in large-scale struc- 
tures gives a strong support for CDM models, with a 
linear gravitational evolution from z 1000 and the ex- 
istence of dark energy. Concretely, the detection is made 
through hypothesis testing, by finding that models with 
BAOs are strongly preferred to models without BAOs. 

The first convincing detection of BAOs in large-scale 
stru ctures was reported in the correlation function anal- 
ysis ( Eisenstein et al.|2005|) of the Sloan Digital Sky Sur- 
vey (S DSS, [York et al.j ( |2000)) Luminous Red Galaxies 
(LRG, Eisens tein et al.| ( |200l[ )) survey. It gave a 3.4cr 
significance m Data Release 3 (DR3). It was followed by 
other detections, as i n the 2-degree Field Galaxy Red- 
shift Survey (2dFGRS 



spectrum analysis ( |Co 
icance 



Colless et al.| (2001)), usmg power 



e et al. 2005 1 with a 2.5c signif- 



Hütsi 



The power spectrum analysis is also applied in 
d2006D with a 3.3cr detection in the SDSS-LRG 



rm^Iñ ^Fercival et al.| p007| | and |Percival et aL| ( [20To] ) 
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the combin ed power spectrum of LRG and Main ( Strauss 
ct al. 2002) samples of SDSS (with respectively DR5 and 



DK7), together with the 2dFGRS survey is used to obtain 
respective significances 3cr and 3.6cr. Very re cently BAOs 
were also detected in the power spectrum dBlake et al. 



2011 ) of the WiggleZ Dark Energy Survey (Drinkwatcr et 
al.|2010 ) at a higher redshift z = 0.6 with a significance 

We must keep in mind that the differ ent detection lev- 
els ca nnot usually be c ompa red. In lEisenstein et al.| 
(2005) and Cole et al. (2005) it is calcuiated with re- 
spect to zer o- bar yon modeis (pure CDM models) . In 



Hütsi ([2006J and |Blake et al 
calculated with re spect to the 



(2011), the significanceis 



no-wiggles" fits of 



Eisen- 



stein & Hu (1998), which remove the baryon oscillations 
signature but keep the int ermediate suppression of power 
due to baryons. F inally in |Percival et al.| ( |2007[ ) and Per- 
cival et al. (2010), it is caiculated with respect to power 
spectrum models where oscillations are smoothed out us- 
ing splines. 

The second use of BAOs consists in constraining cos- 
mological parameters. Again BAOs are very useful, 



because they provi de a statistical standard ruler ( Seo 



& Eisenstein 2003) with an absolute size 



small un certaínty by CMB measurements (Komatsu et 
al.|2009 ). So they directly constrain the reds 
relation in redshift surveys. Besides 



given 



with 



litt-distance 
BAOs appear to 



have the lowest systematic uncertainties among current 



meth ods for studying cosmic expansion (Albrecht et al 
2006). They have been used in combination with other 



cosmological probe s to constrain more efficiently cosmo 
logical parameters (|Eisenstein et al.|2005 Tcgmar k et al. 
20061 IPercival et al.||2007[ |Sánchez et al. |2009[ IPercival 
et al.|[2010[ |Reid et al.||2010| |Kazin et al.|2010| [Blake et 



aL]2011). 



ote that in most studies, the aforementioned con- 
straints not only come from BAOs, but from the whole 
information in the estimated correlation function or 
power spectrum. Note also that BAOs do not need to 
be detec ted before they can be used for parameter con- 
straints (|Cabré fc Gaz tañaga 2011). One could think 
that the BAO peak must be proved to be significant and 
not a random fluctuation, before it is used as a standard 
ruler. However all sources of uncertainty are normally 
taken into account when obtaining the constraints (e.g. 

in the covariance matrix of £ when using the correla- 
tion function), so this argument is not valid. The real 
question is whether the accepted cosmological models are 
correct, and BAO detection in a given sample is just a 
contribution to support these models. 

In this paper we focus on the first use of BAOs, i.e. 
on the BAO detection. We restrict the analysis to the 
correlation function, but most reasonings could also be 
applied to the power spectrum. 

The plan of this paper is as follows: we start in sec- 
tion [2] by discussing the correlation function estimation 
and modeling, as well as the general procedure for BAO 
dctcction. In section [3] we present wavelet methods for 
BAO detection, whicn are mildly model-dependent. In 
the rest of the paper we focus on fully model-dependent 
methods. In section [4] we present the classical method 
used for BAO detection, based on the % 2 statistic. We 
find that this method does not provide the correct sig- 



nificance. So we propose in section[5]a new method that 
we call the Al method, based on two modifications to 
the x 2 method. In section[6]we explain the other use of 
BAOs, i.e. how parameter constraints can be obtained. 
Finally we illustrate the different methods and results 
using simulations in section[7[ 

2. BAO DETECTION IN THE CORRELATION FUNCTION 

2.1. Correlation function 

The correlation function £ is a second order statistic 
that measures the clustering of a continuous field or point 
process. More precisely for the distribution of galaxies, it 
quantifies the excess of probability to find a pair of galax- 
ics in volumes dV± and dV 2 separated by r, compared to 
a random unclustered distribution 



dP 12 = ñ [1 + £(r)] dV x dV 2 



(1) 



with ñ the mean density of the point distribution. 
With the isotropy hypothesis, £ only depends on the 
norm r = ||r|| of the separation vector r. However in 
redshift space the field is not rigorously isotropic. In this 
case, one is usually interested in the monopolc £(r) of 
£(r). In the following we will make the abuse of lan- 
guage of referring to the monopole when speaking about 
the correlation function. 

Given a galaxy survey, the correlation function can be 
estimated by comparing the number of pairs at distance 
r with a random catalogue in the same volume. Differ- 
ent estimators based on thi s method have been proposed 
and empirically compared 



( Labatie et al. || 2011[ | Kerscher 
t et al||l999) ) Whfle' |PorÍs = 



et al. 2000 Pons-Bor dería 

Bordería et al.| ( |l999| ) did not recommend one estima- 



tor tor all cases, |Kerscher et al. (2000) recommend 



use the Landy-Szalay estimator 
the same in the more recent study 
where Landy-Szalay is found to be nearly unbiased for 
current galaxy surveys. It is given by 



to 

íe recommendation is 
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N RR DD(r) Nrr DR{r) 
N DD RR(r) N DR RR(r) 



(2) 



with DD(r), RR(r), and DR(r) the number of pairs at 
a distance in [r±dr/2\ of respectively data-data, random- 
random and data-random points, and N DD , N RR and 
N DR the total number of corresponding pairs in the cat- 
alogues. 

2.2. Modeling the galaxy correlation function 

In the early universe prior to recombination, baryons 
are tightly coupled with photons. This results in acous- 
tic waves traveling in the plasma (BAOs) , but also in the 
suppression of power on small and intermediate scales 
compared to a CDM model without baryons. After the 
time of decoupling (also called drag epoch), the matter 
dcnsity field becomes pressureless, allowing the perturba- 
tions to grow by gravitational instability. This evolution 
can be analytically solved in the linear regime where fluc- 
tuations are small, and only the overall amplitude of the 
p ower spectrum is changin g . 



Eisenstein & Hu (1998) provides fitting formulae for 
tfie linear power spectrum, with a dependence on cosmo- 
logical parameters. From the linear power spectrum, the 
linear correlation function is simply obtained by Fourier 
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transform. The effect of BAOs is clearly identified as se- 
ries of wiggles in the power spectrum, and as a localized 
bump in the correlation function at the sound horizon 
scale r s (see figure 2|). 

To fully model the matter correlation function, onc 
also has to take into account non-linear evolution. This 
can be done using A^-body simulations, which empirically 
provide a correction from the linear to the non-lincar 
correlation functions. Smith et al. (20031 provide correc- 



tions for scale-free power spectrum. So one also has to 
correct for non-l i near d egradation of the acoustic peak. 
Eisenstein et al. (2007) found that this is well approxi- 
mated by a Gaussian smoothing of the acoustic feature 
in real and redshift spaces. 

A last step to model the galaxy correlation function is 
to take into account redshift distortions and galaxy bias 
with respect to matter. Again this can be done using N- 
body simulations, where dark matter halos are populated 
using a halo model. 

Models of galaxy correlation with BAOs are con- 
structed using the linear matter corrclation function with 
non-zero baryon fraction ttb > 0, and further applying 
the different corrections. On the other hand, models of 
galaxy correlation without BAOs can be obtained by set- 
ting the baryon fraction to zero tti, = 0. One can also 
construct no-BAO models with 51^ > 0, by using only 
the non-oscillatory part of the power spectru m to remove 
the effect of BAOs (e.g. the no-wiggles fit of |Eisenstein| 
& Hu (1998)). In this case, BAOs are artificially erased 
and ttie models are non-physical. Yet they enable to test 
the existence of BAOs independently of other baryonic 
cffccts. 

2.3. BAO detection by hypothesis testing 
Lct us dcfinc thc two differcnt hypothcscs 

T-Lq '■ no-BAO hypothesis 
%i : BAO hypothesis 

BAO detection is equivalent to this problem of hy- 
pothesis testing. The common procedure is to design 
a test statistic to assess the truth of the null hypothesis 
Hq. From the test statistic obtained with the measure- 
ment, one computes a p-value and a significance. If the 
measurement if found to be more unlikely than a given 
thresho ld u nder Hq, one rejects Hq and accepts Hi (see 
section 



4.3) 



We focus on the case where the data measurement 
is the^ correlation function estimated in different bins 
£ = (íi)i<i<n- Such a binning is always present for the 
measurement, and thus for the model correlations in the 
hypotheses Hq and ~R\. As a slight abuse of language, 
we will use the terms cstimatcd corrclation function and 
model correlation functions for designating their binned 
versions. 

The hypotheses 'Hq and H\ are based on BAO and 
no-BAO modelsof correlation function, as the ones pre- 
sented section |2.2| The hypotheses also include the 
noise of the measurement, i.e. the covariance matrix 

C = {Cij)i<i¿< n of £. 

We will see in section [3] that wavelet methods are 
mainly sensitive to the BAO feature in the correlation 
function, and not on the global shape of the model. 

On the other hand, usual detection methods (e.g. thc 



X 2 method) are based on full modeling of the correlation 
function. In this case, BAO and no-BAO models of cor- 
relation function £,bao,9 and £,noBAO.e are parameterized 
by 9 to account for variations of cosmological parameters. 
The hypotheses are 

Rq : 3 9 E 6 S.t. £ ~ Af (^noBAO,e,C no BAO,d) 

H!:39eQ s.t. i ~ Aí {£ BA o,e, C B ao,b) 

Most methods work with these hypotheses, where £ 
is Gaussian. Ideally the hypotheses are sampled by N- 
body simulations for each model, which doesnot force £ 
to be Gaussian. Yet we will see in section 17.31 that the 
Gaussian approximation works very well for our lognor- 
mal simulations. 

The classical \ 2 method used for BAO detection that 
we present in section [4] simplifies the hypotheses by as- 
suming constant covariance matrices ( i.e. independent 
of the model) . The reason is that it can be hard to eval- 
uate the covariance matrix for all tested models. In a 
modificd vcrsion of thc y 2 method that wc propose in 
section [5] this simplification is not imposed. 

The distribution of £ is entirely specificd for each hy- 
pothesis when fixing 9, which is not the case when allow- 
ing 9 to vary. In the first case the hypotheses are said to 
be simple¡whereas in the second case they are said to be 
compositqM 

BAO detection makes more sense when allowing vari- 
ations of 9, since it takes into account uncertainties in 
cosmological parameters. However we will see in section 
[ijwith the classical x 2 method, that one has to be careful 
when testing composite hypotheses. 

3. WAVELET FILTERING METHODS 



As explained in section 2.2 BAOs manifest as a char- 
acteristic peak in the correiation function at the acoustic 
scalc r,. For dctccting this fcaturc new mcthods have 



recently emerged, based on wavelet analysis 


Xu et al. 


2010 Tian et al.|2011 


Arnalte-Mur et al.|2011 


. Wavelet 



image analysis ( |Mallat |[2008j |Starck et al ||2010[ ). They 
are specially suited for the anafysis ol data at differ- 
ent scales, and identification of characteristic patterns 
or structures. 

Here the characteristic structure is thc BAO fcature in 
the correlation, with uncertainty in its scale and shape. 
Uncertainty in the scale comes from a wrong fiducial cos- 
mology to convert redshifts into distances, and a weak de- 
pendence of r s on cosmological parameters. Uncertainty 
in the shape is due to non-linear evolution, redshift dis- 
tort ions which are subject to modeling errors (see section 



2.2| 



Tjur focus here is BAO detection, so we present two 
different methods that have been developed for this pur- 
pose. In these methods, a wavelet w = (w(r,-))i<i< n acts 
as a peak finder to detect excess in the measured corre- 
lation (^(rj))x<¿<n' The wavelet w{R,s) is parametrized 
by two parameters R and s, linked respectively to the 

1 In statistics an hypothesis is said to be simple when the dis- 
tribution of the random variable is completely specified. It is said 
to be composite when the distribution is not complctcly spccificd. 
For example a union of simple hypotheses gives a composite hy- 
pothesis. 
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scale and width of the peak. A simple way to do it is to 
consider an original peak-finding wavelet wo and rescale 
it as Wi(R, s) = wq ( Ti ~ R ) (see figure l]). 

One obtains a filtered signal S w (R,s) for every pair 
(R,s) 

n 

S w (R,s) = (w(R,s),fy =¿^(^5)^0 (3) 

i=l 

The next step is to divide S W (R, s) by its noise a w (R, s) 
under T-Lq to obtain Z-scores Z w (R, s) . The parameters 
(Rmax, s m ax) giving the maximum Z-score can be used 
to estimate the BAO scale and width. 



Z W (R, s) 



S W (R, s) 



a w (R, s) 

ymax y j r> \ 

^ w ^wx^max, Srnax ) 



(4) 
(5) 



The two hypotheses are roughly that the maximum 
response Z™ ax is negligible under Ho (no peak is found) , 
and that there is a non-negligible signal under %i (a peak 
is found). To reject T-Lo one performs simulations without 
BAOs, and computes how rarely a value of Z™ ax as high 
as in the data is observed. This gives a p-value and thus 
the significance of the detection. 

The major advantage of using wavelets is that they are 
mainly sensitive to the BAO feature, and not to smooth 
changes in the correlation function. Furthermore, scale 
variations of the analyzing wavelets allows dilations of 
the correlation function (to probe different cosmologies) , 
and variations of its width allows for variations in the 
shape of the BAO peak. The consequence is that the 
wavelet response is mainly related to the existence of a 
BAO peak, and mildly dependent on the whole model- 
ing of the correlation function. In other words, wavelet 
methods are robust to small modeling errors in the cor- 
relation function. 

The price to pay is that they are outperformed by some 
model-dependent methods, when there are no modeling 
errors in Ho and Hi- As we will see in section [5j the 
A\ 2 (respectively Al) statistic can be seen as a general- 
izcd likelihood ratio between Hq and T~L\ in the case of 
constant (respectively varying) covariance matrix. The 
interest of the likelihood ratio is that it is optimal in the 
Neyman-Pearson sense for simple hypotheses (see section 
|4.3| and appendix [C| . For composite hypotheses there is 
no such notion of optimality. But we also find in section 
[7] that the generalized likelihood ratio Al gives better 
results than A\ 2 in the case of varying covariance ma- 
trix. Since wavelet statistics are not directly linked to the 
likelihood ratio, we expect less good results than with a 
generalized likelihood ratio. 

A w avelet detection method was used in ITian et al.l 
(2011) on the SDSS Main sample. An angular average 
ot the 3D anisotropic correlation function is computed, 
by applying a flat weighting in all directions, instead of 
the angular average on the sphere. The justification is 
that the BAO feature is sharpened in the line-of-sight 
direction, and this type of weighting is giving more im- 
portance to it. The correlation is further analyzed with a 
mexican hat wavelet (see figure[T]). Using Gaussian simu- 
lations with a no-wiggles power spectrum, they find that 
only 0.2% of the simulations have the statistic Z™^ at 
the same level as the data, which means a 3.1er detection. 




FlG. 1. — Difíerent analyzing wavelets used for BAO detection. 
We show the mexican hat with parameters R = 113.6 /i _1 Mpc, 
s = 20 h~ x Mpc (blue) and the BAOlet with parameters R = 
116 /i — ^Mpc, s = 36/i -1 Mpc (red). These parameters cor respond 
to the maxim um responses Z™ ax in the respective studies [Tian et| 
[al~| ( |2011| ) and |Arnalte-Mur et al~] \20U\ . 



In |Arnalte-Mur et al.| (20111 two different galaxy 
samples ot the SDSS DR7 are used, to compute the 
mean density profile of the Main sample around LRGs. 
This signal is analyzed with a 3D wavelet called BAOlet 
(see figure [I]). Because the wavelet is isotropic, it is 
equivalent to applying a 1D wavelet transform on the 
cross-correlation LRG-Main. Simulations of Ho are 
made by replacing LRGs by random centers to show 
that LRGs are located at special positions. Using this 
hypothesis, a signal as high as Z^ ax is found with a 
probability p = 4.10~ 5 in simulations, corresponding 
to a 4.1(7 detection. Again this cannot be compared to 
other existing methods, because the tested hypothesis 
Ho is vcry different. 



4. x 2 METHOD 

The x 2 niethod is the classical method used for BAO 
detection and can deal with the general case of varying 
cosmological parameters. Unlike wavelets methods pre- 
sented in section [3] it is fully model-dependent, so it is 
mainly useful whcn all cffccts in the correlation function 
are well understood. 

The x 2 method is also designed for hypotheses where 
the measurement £ is Gaussian. In the rcst of thcpapcr 
we will only consider such hypotheses. In section 7.3[ we 



will see using simulations that the Gaussian approxima- 
tion is well justificd. 

4.1. The x 2 statistic 

For a measured correlation function £ ~ J\í (£ m , C) the 
X 2 statistic writes 



l<z,j<n 



[£(n) - U(n) 



C 



1,3 



(6) 

Í(r 3 )-U(r 3 )} (7) 



Supposing that the model is correct (i.e. £ ~ 
Af (£ m ,C)), the x 2 statistic follows a Xn distribution, i.e. 
a chi-square distribution with n degrees of freedom. The 
X n distribution can be interpreted as the one followed by 
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the sum of the squares of n independent standard nor- 
mal variables. If X\, . . . , X n are n such i.i.d. Gaussian 
variables then Yn=i follows a X 2 distribution. 

4.2. X 2 rnethod for BAO detection 

Let us show how the BAO detection is usually per- 
formed. We note 9 = {9\, . . . , 9k) € O the dependence 
parameters for the model correlation functions with and 
without BAOs, £bao,6 an d £,noBAO,e- The hypotheses 
are given by 

H : 3 9 e O s.t. | ~ N {Í noBA o,e, G) 
H\:39ee s.t. Z~N{£,BAo,e,C) 



As mentioned in section 2.3 the X 2 method tests hy- 
potheses with a constant covariance matrix C . The pa- 
rameters 9 are not directly cosmologic al parameters but 
are linked to them. For example in Eisenstein et al. 



(2005), they are given by a dilation parameter a (to ac 



count for a wrong fiducial cosmology to convert redshifts 
into distances), an amplitude parameter b 2 (to account 
for galaxy bias, redshift distortions, and power spectrum 
normalization crg), and the parameter íl m h 2 (determin- 
ing the horizon scale at matter-radiation equality, the 
amplitude of the BAO peak, and more moderately the 
position of the peak). Other parameters also have an im- 
pact on the expected correlation function {Q h 2 and the 
spectral index n) but they are well constrained by CMB 
data and can be fixed as a good approximation. 
The x 2 has a dependence on the parameters 9 



Xbao,6 = \€ — Íbao,6,C (£-£bao,é>) 



(8) 

XnoBAO.6 = (Í - ínoBAOfi^C^ 1 ^ - ínoBAOfi)^ (9) 

For each class of models one can look at the ~x 2 best- 
fits, ming x no BAO,e an d mm s Xbao,6- ^ is a widely used 
result that the best-fit X 2 value follows a X n -k distribu- 
tion, assuming that the true model is inside the fitting 
class. So the number of degrees of freedom decreases 
by the number of parameters in the fit. We stress that 
this result (see appendixlAl) is only rigorously valid when 
the space of model correlations is affine. We recall that 
the measurement vector £ and the model correlations 
£ m {9) are n-dimensional binned versions of their contin- 
uous counterparts. So the set of all model correlations 
(£m(0))eee constitutes a subspace of W 1 , which needs to 
be a fc-dimensional affine space for the previous result to 
hold (see appendix |A|) . 

This result can be used on ming Xbao e t° verify that 
data are compatible with H\. More precisely it can be 
tested whether ming Xbao e is compatible with its dis- 
tribution when the true model is in H\ 

(10) 



WP-XBAO,, 



Xn—k 



For the rejection of Ho, the usual procedure is more 
complex. We add an artificial parameter in the fit, which 
accounts for the presence of BAOs in the model correla- 
tion. For example this parameter ¡3 can be a weighting of 
£,bao,6 a nd (,noBAO,e in the model correlation function. 



Under Hq, the expected correlation function is of the 
form £,noBAO,e for the true parameters 9 = 9q, but it is 
also of the form with /3 = and 9 = 6q. Thus the 
best-fit x 2 value in the no-BAO class follows a X 2 _ fe dis- 
tribution, and the best-fit x 2 value in the extended class 
follows a X n -(k+i) distribution (since (3 is an additional 
parameter) . 

We are in the case of two nested classes of models, 
which both contain the true model. In this case, the 
difference of the best-fit values follows a chi-square dis- 
tribution with number of freedom equal to the difference 
of parameters between the two classes. Again this re- 
sult is not rigorously true in the general case, but only 
whcn thc two spaccs of model correlations are affine (see 
appendix [b] . 

Herc thcrc is only one additional parameter, /3, in thc 
extended class, thus the difference of the best-fit valucs 
follows a x\ distribution. 



A X 2 g iobai = min x n oBAOfi - min X^,e ~ Xi 

* 6 ' ¡3,6 



(12) 



This accounts for the fact that fitting an additional pa- 
rameter, which is not required by the true model, only 
moderately decreases the best-fit value. To reject Hq, 
one can look at this difference Ax^iobai anci compute how 
unlikcly it is under Ho (i.e for a X\ distribution) . 

In practicc, thc bcst-fit in thc wholc cxtcndcd class is 
replaced by the best-fit in the BAO class (i.e. restricting 
to (3 = 1) 



Ax 2 = ramXn BAO,e ~ min XsAo,i 



(13) 



This difference A^ 2 



is necessarily less than Ax 2 /oho ¡ 



of 



equation (12), which follows a x\ distribution undcr Hq. 



Thus for a realization with an arbitrary value Ax^ 



we have 



P(A X 2 > x | Ho) < P{A x 2 3loba i >x\H ) 



(14) 



A x\ distribution is simply the distribution followed by 
the square of a standard normal variable. Noting $ the 
cumulative distribution function of a standard Gaussian 
variable, we get for x > 

P(A X 2 > x | Ho) < P{A x 2 global > x | Ho) = 2#(-V5) (15) 

which corresponds to a number of a equal to yfx for a 
normal distribution. Thus, when A X 2 > 0, one can eval- 
uate the significance of the BAO detection as ^/A X 2 .u. 
A significance is originally given in terms of a p-value, 
i.e. the probability of obtaining the measurement value 
under Ho- When given as a number of a, it is simply 
the corresponding number of standard deviation to the 
mean for a Gaussian variable. 



In |Eisenstein et al. ( 2005 ) the difference of chi-square 
equals Ax 2 = 11.7, corresponding t o a 3 .4cr detection 



using this method. In Percival et al. (2010) it is equal to 
A X 2 = 13.1, corresponding to a 3.6<r detection. 



Because of the inequality in equation ( 14 ) the method 
may seem conservative. However because the assump- 
tions of the method are not verified, we will see in section 



ÍP,6 = P £,BAO,6 + (1 - P) £,noBAO,< 



(11) 



7.4.1 that thc mcthod actually overestimates the signifi- 
cance. 
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4.3. Limitations of the \ 2 rnethod 

The Neyman-Pearson lemma states that, when per- 
forming a hypothesis test between Ho an d H\, the most 
powerful tests are based on the likelihood ratio 



A(0 = 



(16) 



More precisely, the most powerful test of significance 
a is 

• if A(£) < i] then accept H\ (i.e. reject Ho) 

• if A(£) > i] then accept Ho (i-e. reject H\) 



Let us consider a common case where the parameters 
9 ar e given by 9 = (fl m h 2 , a, b) as introduced in section 



4.2 



For illustrative purposes, we only take into account 
the dependence in the bias b of the covariance matrix. It 
gives a multiplicative factor b 2 in the expected correlation 
function, and at first approximation (up to shot noise) a 
multiplicative factor b 4 in the covariance matrix. So the 
covariance matrices are given by 



Cbaoj 



C 



noBAO 



g (X 6 4 C 



(20) 



In the classical \ 2 method, Xbao,0 and XnoBAO,e are 
computed with a constant covariance matrix C. So for a 
realization AÍ; with A > 0, we get the BAO best-fit 



with a the probability of rejecting Ho if it is true (type 
I error) 

a = P ( A(D < v I Ho) (17) 



^XBAoAM) 



The power of the test is defined as the probability of 
accepting H\ if it is true. It is equal to 1 — (3 where (3 
is the probability of type II error, i.e. the probability of 
not accepting H\ if it is true. 



P = P (A(0>»?|Wi 



(18) 



In practice, such tests with given thresholds are not 
really used, and it is more common to cite the significance 
level for the realization value. For a realization with an 
arbitrary value A(£) = x, the significance (given as a 
p-value) is 

(19) 



a(x)=P [A{0<x\H 



As we show in appendix [ÜJ the Neyman-Pearson 
lemma implies that the expected significance under H\ 
obtained with A(£) is better than with any other statis- 
tic. More precisely, under H\, the expected p-value of 
equation ( 19 ) is smaller, and the expected number of o~ 



is larger for A(£) than for any other statistic. 

Note that the statistic A(£) is optimal in this sense, but 
we need to know its distribution under Ho (to compute 
the significance a(x) corresponding to a realization value 
x). Moreover in the case of composite hypotheses, the 

dist ribut ion of A(£) is not well-defined under Ho (see sec- 
2.3 ) so the significance is also not well-defined. The 
is that its distribution is identical 
In this case 



tion 



advantage of A^ 2 



Vglob 

for every model in Ho (a X\ distribution) . 
one is able to give a significance even with composite 
hypotheses. Yet this result is subject to a regularity as- 
sumption, that spaces of model correlation functions are 
affine. Because it is no t verified, we will see with simu- 
lations in section 



7.4.1 



that the distribution of A% 2 



global 

is quite different from a x 2 an d that the % 2 method only 
gives a rough estimate of the significance. 

Thc estimate can be even more wrong when consider- 
ing more realistic hypotheses, where covariance matriccs 
depend on the model 

Ho ■ 3 9 € 6 S.t. £ ~ N (£noBAO,9, C no BAOfi) 

H\:39ee s.t. i ~ M (ÍBAOfi, C B Ao,e) 



= min (a£ - S,bao,6, C X (A|- £,BAO,e) 
e \ 



Á 2 min ( £ 



^BAO,e,C 1 K - -^bao, 



A 2 min XbaoAO 

u 



(21) 



The last equality comes from the role of b, which en- 
ables any positive multiplication of the model. The same 
reasoning can be applied to ming X uo bao e which g e ts 
multiplied by A 2 , and thus the statistic A% 2 also gets 
multiplied by A 2 . 

Given the hypotheses with varying covariance matrix, 
Hq realizations with 9 = (Q, m h 2 ,a,b\) can be obtained 
from Ho realizations with 9 = (íl m h 2 , a, bo) by a multi- 
plicative factor (b\/bo) 2 . As a result, the distribution of 
A% 2 gets dilated by a factor (6i/6 ) 4 - This creates very 
large differences between the different models in Ho, so 
the classical x 2 method provides very bad estimates of 
the significance. The conclusion is that the classical x 2 
method cannot be used in the case of varying covariance 
matrix. 

5. MODIFIED x 2 METHOD 

In this section we propose two modifications to the x 2 
methods to overcome its limitations. A first modification 
enables to obtain the correct significance in all cases. So 
unlikc thc classical x 2 method, our modified method can 
be applied to hypotheses with varying covariance matri- 
ces 

Ho '■ 3 9 6 s.t. £ ~ N (^ n0 BAo,e,C n0 BAO,e) 

H\:39eQ s.t. £ ~ N (£ BA o,e, C BA o,e) 

The way to rigorously compute the p-value for a mea- 
surement A^ 2 = x is to consider the "worst-case" Ho 
model 



p(x) = maxP(Ax > x\H ,( 



(22) 



So for every model in Ho, the p-value of the measure- 
ment A% 2 = x is less than p(x). When considering the 
significance s(x)a corresponding to this p-value, we get 
that every model in H is at least rejected at s(x)a. Note 
that this is the best significance we can get to reject all 
Ho models simultaneously. 
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This way of obtaining the significance does not rely on 
any assumption, unlike in the classical x 2 method. How- 
ever it requires more work to compute the distribution 
of A% 2 under every "Hq model. Wewill see how precisely 
this can be achieved in section [7.4.2[ when applying the 
procedure on simulations. 

The second modification we prop ose is on the statis- 
tic itself. As we saw in section |4~3l the optimal statistic 
to test simple hypotheses Hq ánd T-L\ is the likelihood 
ratio. However when working with composite hypothc- 
ses, likelihoods are not well defined. Indeed they can be 
defined for any model in %q or Hi but not for Hq and 
%i themselves. The idea of the A% 2 statistic is that it 
can be thought as a generalized likelihood ratio between 
composite hypotheses. Indeed in the special case of a 
constant covariance matrix we have 

Ax 2 = minxn BAO,9 - mmXsAo.e 

= -2 \m,sx,h¡,(C n aBAO,$) - max ln (£ BA o,s)] 
maxe C n0 BAO,< 



-21n 



maxfl Cbaoa 



(23) 



with the same additive constant for Ibao.b and 
InoBAOfii which can be taken as 0. 
Note that Al is not equivalent to A\ 2 even if Xbao e 



and XnoBAO e are computed using equations (27) and 



(28). Indeed in the case of varying covariance matrices, 



one has to take into account variations of the matrix 



determinant as in equations (30) and (31). 



We will refer to the Al method for the method modi- 
fied as we suggested: replacing A% 2 by Al and modifying 
the procedure to obtain the correct significance. 

6. COSMOLOGICAL PARAMETERS CONSTRAINTS 

Let us describe the second use of BAOs, which can 
help constrain cosmological parameters. Here the true 
cosmological model is supposed to be in T~í\. 

6.1. C'onstraints with constant covariance matrix 

In the case of a constant covariance matrix the hy- 
potheses Hi is given by 

Hi:30e0s.t. Í~Aí^bao,9,C) 



where we used the relation between x an d the 
likelihood of equation (32). This is only valid for a 
constant covariance matrix. To extend this idea in the 
case of varying covariance matrices, we simply consider 
the statistic 



Al 



maxln (C noBAO ,e) ~ maxln (C BA o,e) 



(24) 



We use the notation AZ because it refers to a diffcrcncc 
of log-likclihoods. It is a slight abuse of notation since 
it is not strictly speaking a difference of log-likclihoods. 
Unlikc A% 2 , Al is still a generalized likclihood ratio for 
varying covariance matrices. So it sho uld g ive better 
results in this case as we verify in section |7.5| 

In the case of varying covariance matrices the likeli- 
hoods write 



C B AO,e oc \CBAO,e\ ' < 



C 



noBAO, 



,oc|C 



noBAO 



-1/2 e -- 



(25) 
(26) 



where Xbao.6 an d XnoBAO e are computed by taking into 
account variations of the covariance matrix 



XBA0.9 = (£- ÍBAO,9, C B \ Q $ (£ - £,BAO,e) 



(27) 



XnoBAOfi — — ÍnoBAO,6, C noBAO g (£ — £,noBAO,e) )(28) 



For a given model in "H\, the measurement ^ is Gaus- 
sian and Xbao e ^ s equivalent to the log-likelihood 

X 2 BAO,e - -nln(2n\C\) - 2\n[c BA o,e{0) (32) 

To define the posterior probability p(9 \ ¿) one needs a 
prior p(9) on 9 



P (9\Í)^p(9)C B AoAÍ) 

ocp(9) exp (-^x 2 bao,í 



(33) 
(34) 



To obtain constraints only coming from £ one can as- 
sume a constant prior p(9) 



p(0|é)(x£ BA o,e(C) 



(35) 



Note that this choice is still arbitrary because it is 
linked to a given parameterization. Indeed a constant 
prior p(9) can lead to a non-constant prior for a different 
parameterization. 

To combine constraints from £ with the ones from other 
independent experiments, one has to modify the prior. 
For example with CMB data the posterior is given by 

p(9 | CMB, £)(xp(9, CMB, £) 

<xp(9,CMB)p(Í\9,CMB) 
<xp(e\CMB)C B AO,e(Í) (36) 



Let us write l no BAO,e and l B AO,e fo r -2 ln (C noB AO,e) 
and —2\a(C B Ao,e) we get 



Al = min l noB AO 

6 



minl BAO ,8 



\n\C BA o,e \ +cst 



l B Ao,e = X B Ao, 

lno B AO,e = X noB AO,e + 1 R \C noB AO 



(29) 

(30) 
cst (31) 



where we used the independence of ^ and CMB measure- 
ment. Adding the CMB measurement is thus equivalent 
to using a prior p(9) = p(9 \ CMB). 

Again we consider the parameters 9 given by 9 = 
(rt m h 2 , a,b). The parameter a accounts for dilation of 
the correlation function and b corrcsponds to a multi- 
plicative factor b 2 . The correlation function models are 
thus given by 



€sAO,e(r) = b 2 £ BA o,n m hi(ar) 



(37) 
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a is linked to the dilation scale Dy(z) at thc 
mean redshift of the sample z by the relation a = 
Dv(z)/Dvjid(z), with Dvjid(z) the dilation scale for 
the fiducial cosmology used to construct the 3D data 
catalogue. The dilation scale expresses how di stances di- 
late when modifying the fiducial cosmology (|Eisenstein| 



et al. 



of-sig 



2005[). It depends on the Hubble parameter (iine- 



ít dilation) and the transverse comoving distance 



Dm(z) (transverse dilation) 



D v (z) 



D M (zf 



cz 
H(z) 



1/3 



(38) 



One is interested in constraining Vl m h 2 and Dv(z). 
We consider a constant prior p(9) to obtain constraints 
only from £. The posterior p(Q m h 2 , a\£) is obtained by 
marginalizing over the multiplicative factor B = b 2 



p(ü m h ,a\£)oc J C BAOÁÍlmh 2^ >B) dB (39) 

K / 6XP ( _ ^ x sAO,(n m h2, ai s)^) dB (40) 

The posterior of íl m h 2 is obtained by marginalizing 
p(íl m h 2 , a | £) over a, and the posterior of a by marginal- 
izing over fl m h 2 . For each parameter, the maximum in 
the posterior gives the parameter estimate and the stan- 
dard deviation can give a ler interval. 

6.2. Constraints with varying covariance matrix 

In the case of varying covariance matrix the hypothesis 
T~í\ is given by 

HnBOeG s.t. £~ÁÍ(ZBAO,e,C BAO ,e) 

In this case one has to take into account the depen- 
dence of the covariance matrix on the model 

XBAOfi = ÍÍ - ^BAOfi, C B AO,6 (Í ~ ^BAOfi)^ (41) 

2 

X BAO,0 



C 



BAO, 



)Oc|C 



BAO,e\~ 1 e ~ ( 42 ) 

Let us consider the simple dependence CsAOfi cx b C 
as an illustration again. In this case, the marginalization 
over b 2 gives a different result compared to the result with 
constant covariance. So the obtained posteriors of fl m h 2 
and a are also different. We will see with simulations in 
section [776] that this changes indeed the constraints. 

7. TESTS ON SIMULATIONS 
7.1. Simulations 

We use the same procedure for generating lo gnormal 
simula t ions o f the SDSS DR7 LRG sample as in Labatie 
et al. (2011) with only small differences in the input 
power spectrum. Because we consider a volume-limited 
LRG sample with only the northern contiguous region, 
the volume is approximately half of the the full LRG 
sample and the number of galaxies a third. In particular 
the expected detection significance is lower than for the 
full LRG sample. So our focus is not on the expected 
detection significance for current surveys, but rather on 
comparing thc different methods. 

We use a ACDM power spec trum given by the iCosmo 
software ( Refregier et al.|20lT ), with parameters h = 0.7, 



VL b = 0.045, ü m = 0.27, il A = 0.73, n s = 1.0, cr 8 = 0.8, 
and taken at redshift z = 0.3. The transfer fun ction 
is the form with wiggles of Eisenstein & Hu ( 1998 ) and 
the non-linear correction to tne p ower sp"ectrum ís ob - 
tained using the fitting formula of Smith et aL (2003). 
We model the non-linear degradation ol the acoustic peak 
by multiplying the part of the power spectrum contain- 
ing the oscillations by the function exp(— a 2 k 2 /2) with 
a = 7/^ _1 Mpc (i.e. smoothing the oscillations in the cor- 
rclation with a Gaussian of width a). This is fou nd to be 
a good approximation in Eisenstein et al. (2007). It con- 



sists more precisely in constructmg the power spectrum 
using the forms with and without wiggles 

P(k) = Pna W i g (k)+exp(-a 2 k 2 /2)lP wig (k)-P nowig (k)] (43) 

We further apply a constant bias b 2 with b = 2.5 to 
this power spectrum so that the corresponding correla- 
tion function matches the one estimated on the SDSS 
LRG sample. Our simulations do not take into account 
the scale-dependence of the galaxy bias. However this is 
not a problem here since we only use simulations, and do 
not analyze real data. 

We estimated the correlation function on 2000 inde- 
pendent simulations using each time the Landy-Szalay 
estimator and 100.000 random points. With this proce- 
dure Landy-Szalay has been shown to be the estimator 
with minimum varia nce, and to be nearly unbiased (see 



Labatie et al] ( |2011[ )) 

We estimated the covariance matrix by the empirical 
covariance matrix of the measured correlation function 
in the simulations. We use bins of size 10/i _1 Mpc and 
perform the analysis in the range 20 to 200/i _1 Mpc. In 
this way we obtain n = 18 bins, corresponding to 171 
free parameters in the covariance matrix. This is much 
smaller than the number of simulations so that the empir- 
ical covariance matrix giv es a good estimate of th e true 
covariance matrix (see e.g. Pope & Szapudi (2008)). An- 
other reason for not using a strong binning, is that most 
methods use the inverse of the covariance matrix. Sincc 
the bins are very correlated at small separation, the in- 
verse matrix is very oscillating for too small binning, and 
the result becomes non robust to small modeling errors. 

We find a small statistical bias in the correlation func- 
tion estimation, due to the limited resolution of the log- 
normal simulations or the integral constraint (see Labatie 
et al. (2011 1). This bias is negligible in absolute value 
(r¡ 5 x 10 -4 ), but when multiplicd by C _1 as in Z w , 
Xbao e or XnoBAO e ^ can slightly affect the result. Here 
the lognormal simulations will only be used for comput- 
ing the covariance matrix and verifying the Gaussianity 
of £. However when working with real data, one should 
verify that the integral constraint is not biasing results. 

7.2. Models 

Unless othcrwise stated, we use for the BAO andno- 
BAO mode l s in % \ and T-Lq the transfer function of |Eisen-1 
stein & Hu ( 1998 ) with respectively the form with wigglcs 
and without wiggles. At some points we also quote the 
results obtained with no-BAO models constructed with 
zero baryon. 

The procedure for generating the power spectrums is 
the same as for the lognormal simulations. Obviously 
the non-linear degradation of the BAO peak is only per- 
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formed for BAO models, by smoothing with a kernel of 
size a in comoving coordinates. 

The respective correlation functions Cbao,o and 
(,noBAO,Q are obtained from the power spectrums by in- 
verse Fourier transform in 3 dimensions, i.e. by a Hankel 
transform in the isotropic case. The power spectrum 
has bins with exponential sizes in k (i.e. the ln(fc¿)'s are 
spaced linearly) since it is smooth in that space. For do- 
ing the Hankel transform with this spacing we use the 
FFTLog cod^j The correlation is finally binned equiva- 
lently as when it is estimated by pair counting, i.e. for a 
bin [r ¿ - dr/2, r¿ + dr/2} 



l 



Ti+dr/2 2 , 
-dr/2 r dr 



(44) 



We fix the parameters Í7f,/i 2 = 0.0315, n s = 1.0 and 
«Tg = 0.8 as in the lognormal simulations, and choose the 
same parametrization as before 6 = (il m h 2 , a, b) for the 
ACDM correlation functions 

£,BAO,e(r) = b 2 £BAO,(i m h*(.ar) 



£,noBAO,e(r) = b 2 C 



noBAO,U m h 



: (a r) 



To obtain these functions we vary fl m h 2 , adjust the 
value ÍÍa = 1 — Í2 TO , and perform the dilation in a of the 
correlation function obtained in comoving coordinates. 
There arc diffcrent choiccs for varying fl m h 2 , and we 
choose to keep h = 0.7 constant and vary Í7 TO . Anothcr 
choice would be to keep Q m = 0.27 and vary h. Thcsc 
choices lead to different amplitudes of the correlation due 
to diffcrcnt growth factors at redshift z = 0.3. We ver- 
ified that our results are only slightly affected by this 
choice. 

Finally the functions are rescaled by a factor b 2 for 
modeling the galaxy bias. We have to take into account 
that no-BAO models with zero baryon have different am- 
plitude than the other models for the range of scales con- 
sidered (for the same <7g). Therefore we rescale them by 
an amplitude factor of 1.29, which is found to minimizc 
the distance between the no-BAO and BAO models for 
the parameter valucs of the lognormal simulations 

(^BAO — ínoBAOiC 1 (ÍbAO ~ ZnoBAo)) (45) 

The lognormal simulations correspond to parameters 
ü m h 2 = 0.1323, a = 1, and b = 2.5. In figure § we 
plot the correlation function of the simulations, for thc 
two corresponding no-BAO models and for different val- 
ues of íl m h 2 . The no-wiggles model has only the BAO 
peak smoothed out, whereas the zero-baryon model (with 
Í2¡, = 0) has a different global shape. Among BAO mod- 
els, Ví m h 2 controls the proportion of baryons in the total 
matter. When fl m h 2 increases, the proportion of baryons 
decreases, and baryonic effects such as the BAO peak are 
reduced. On the contrary when íl m h 2 decreases, bary- 
onic effects are amplified. 

7.3. Verification of the Gaussianity of £ 

In this section we verify the Gaussian hypothesis £ ~ 
A/"(£bao?C) on our lognormal simulations. Even if we 

2 http:/ /casa. colorado.edu/~ajsh/FFTLog/ 
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FlG. 2. — Correlation function £(r) as observed in the fiducial cos- 
mology. In inset we plot r 2 £(r) for a better visualization. For the 
lognormal simulation parameters (Q m h 2 ,a,b) = (0.1323,1,2.5), 
wc plot the BAO model (black), the no-wiggles model (blue), and 
zero-baryon model (purple). The no-wiggles model has just the 
BAO pcak smoothed out, whereas the zero-baryon model has a dif- 
ferent global shape. We also plot the lognormal simulations mean 
and error bars (red) which shows simulations are very precise. Fi- 
nally we plot two other BAO models by changing f! m /i 2 = 0.1423 
(orange) and f2 m /i 2 = 0.1223 (green). Increasing Q m h 2 reduces 
the baryonic effects such as the BAO peak, whereas decreasing 
Q m h 2 amplifies these effects. 



0.10 - 




FlG. 3. — Estimated pdf of X%ao (black) using the histogram on 
2000 lognormal simulations and pdf of a Xi$ distribution (red). Er- 
ror bars give the Poisson error in the estimate due to finite number 
of simulations. 

do not expect large differences, the next step would be to 
verify it on A^-body simulations, which are more realistic. 
First we look at the Xbao statistic 



XBAO = \Í - £,BAO,C ^(C-^SAO) 



1 <í ,j <n 



Xbao 



With the Gaussian hypothesis £ ~ Af (£,bao, C) 
follows a chi-square distribution with n degrees of free 
dom 

(46) 



Xbao 



X. n 



We compare the histogram of Xbao 011 our lognor- 
mal simulations to thc probability density function 

É)df) of a Xn variable where n = 18. We show figure 
the very good agreement between the two distributions. 

For the next tests, we look at wavelet methods of sec- 
tion [3j In these methods we obtain a wavelet response 
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FlG. 4. — Estimated pdf of Z w foi the mexican hat filter with pa- 
rameters R = 113.6 /i —1 Mpc, s = 20/i _1 Mpc using the histogram 
on 2000 lognormal simulations (black), and pdf of a standard Gaus- 
sian centered on Z w (red). Error bars give the Poisson error in the 
estimate due to finite number of simulations. 
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FlG. 5. — Estimated pdf of Z w for the BAOlet filter with param- 
eters R = 116/i _1 Mpc, s = 36 h~ ^Mpc using the histogram on 
2000 lognormal simulations (black), and pdf of a standard Gaus- 
sian centered on Z w (red). Error bars give the Poisson error in the 
estimate due to finite number of simulations. 



S w (R,s) and a Z-score Z w (R,s) for every parameter 
(R,s) 



S W (R, s) = (w(R, s),i) = ]T Wi(R, s)i(n) 

i=l 

S w (R,s) 



Z w (R,s)- 



a w (R,s) 



We only consider the mexican hat filter with parameters 
R = 113.6 /i _1 Mpc, s = 20/i^Mpc and the BAOlet fil- 
ter with parameters R = llG/i^Mpc, s = 36/i _1 Mpc. 
These parameters maximize the Z-score obtained on the 
data m eas urement, in the respective studies [Tian et aL] 
( |2011| ) and |Arnalte-Mur et aL ( |2011[ ). In ordér to obtain 
Z w on our simulations, we compute the noise a w of S w 
using the covariance matrix of the simulations 



\J (w, Cw) 



(47) 



With the Gaussian hypothesis £ ~ J\í (£bao , C) , Z w is 
Gaussian with mean E[Z„,] and standard deviation equal 
to 1. We plot in figures [3] and [5] the histogram of Z w on 
our lognormal simulations, respectively for the mexican 
hat and for the BAOlet filter. As we already mentioned, 
there is a small bias between Z w on simulations and 
E[Z„,]. Here we are only interested in the Gaussianity 
of Z w and not in this small bias, so we compare the 
histogram to the pdf of a Gaussian N (Z w ,l). Again 
we find a very good agreement between the simulations 
and the Gaussian prediction. 



7.4. BAO detection with constant covariance matrix 

7.4.1. Classical \ 2 raethod 
The tested hypotheses are given by 

H : 3 9 £ O S.t. Í~ M (ÍnoBAOfi, C) 

m-.BdeQ s.t. Í~Af(tBAo,e,C) 

The Gaussian hypothesis on £ is well justified (at least 
for our lognormal simulations) as we have seen in section 



7.3 The chi-square quantities are functions of 9 
Xbao,9 — (i ~ S,BAo,e,C' 1 (Í - Íbao.b)^ 

XnoBAO,9 = (Í — £,noBAO,6,C~ X (Í - ^noBAOfi)) 



An extended model of correlation function is implicitly 
defined, which mixes the BAO and the no-BAO models, 
e.g. 

ífi,e = I3(,bao,b + (1 — P) ínoBAo.e 
Let us write as before 

Ax 2 = min XnoBAOfi ~ m j n Xbao,6 

u u 
&X%obal=™lXnoBAO,e ~ 

a e ' p,e ' 

The basic assumption of the method is that ^x\iobaX 
follows a x\ distribution under Rq. Since we have A% 2 < 
^X^iobai -V construction, a conservative estimate of the 
significance is given by \J Ax 2 .cr when A% 2 > 0. 

We recall that ^X^iobal ~ Xi is subject to the as- 
sumption that the spaces of model correlation functions 
(£,noBAO,e)eee and (C/3,e)/36K,eee are affine. Since it is 
not easy to verify we want to test with our simulations 
that we have indeed &X 2 g i bai ~ Xi under Ho- 

For a model 9 in Jio, w ^ generate realizations as 



C 1 ^ 2 ^ + ^noBAO,! 



(48) 



where g is a standard multivariate Gaussian. For each 
realization, we find the best-fit model by testing all the 
fi m /i 2 and a values on a grid. The remaining parameters, 
which are the bias b 2 and the parameter (3, are found 
analytically for the best-fit. 

For the grid (Q, m h 2 ,a) we take ü m h 2 <E [0.0423, 0.2923] 
with grid step 0.005 and a £ [0.5, 1.5] with grid step 0.01. 
We also allow any b 2 > and /3. We test two models in 
%o with values £l m h 2 = 0.1323, a = 1.0 as in the lognor- 
mal simulations, and with other values £l m h 2 = 0.0823, 
a = 0.9. Each time we generate 10 000 realizations using 



equation ( 48 1 to estimate the distribution of Ax 2 ¡ 0&a ¡ and 
A% 2 . We show in tables[l]and[2]for different thresholds t, 
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the p-vahie and corresponding significance for a Xi vari- 
able, for Ax 2 global , and for A% 2 - Our results show that 
the assumption Ax 2 ¡ ODa ; ~ Xi is clearly wrong. In par- 
ticular, the mean of AXgiobai f° r tne two different models 
are respectively 2.85 and 2.23, whereas the mean of a x\ 
variable is equal to 1. 

In both cases one grossly overestimates the significance 
when identifying Ax 2 ; ooa ¡ w ith a xí distribution. Thc 
fact that the classical x 2 method is conservative because 
it uses the value of Ax 2 instead of Ax 2 glohal can compen- 
sate this overestimation. For one H model, identifying 
A% 2 with a xí distribution still gives a small overestima- 
tion of the significance. For the other H model, it gives 
an underestimation of the significance. 

Let us stress that these significances are not strictly 
speaking the significances of thc B AO detection, but only 
the rejection of particular H models. Indeed the BAO 
detection consists in the rejection of all H models simul- 
taneously. 



TABLE 1 



TABLE 2 





xí 


A X 2 g iobai 


A 


x 2 


P(X>1.0) 


0.32 (1<t) 


0.66 (0.45<t) 


8.0x10" 


2 (1.75<t) 


P(X>2.25) 


0.13 (1.5<r) 


0.37(0.9<t) 


4.2 xl0~ 


2 (2.05<r) 


P(X>4.0) 


4.5x 10~ 2 (2<t) 


0.17(1.4<t) 


1.6x10" 


2 (2.4<t) 


P(X>6.25) 


1.2 X 10 -2 (2.5<r) 


5.5xl0~ 2 (1.9<t) 


5.7x10" 


3 (2.75<t) 


P(X>9.0) 


2.7x 10~ 3 (3<r) 


1.3xl0 _a (2.5<t) 


1.8x10" 


3 (3.1<t) 



NOTES. — Same as table^for the rejection of the particular 'Hq model 
with fí m íi 2 — 0.0823, a — 0.9. Again the assumption Ax 2 ioba i ~ x\ i s 
wrong with a significanco that is grossly ovcrcstimatcd. Thc fact that 
the classical x 2 method uses A\ 2 instcad of AXg ¡oba ¡ compcnsates the 
ovcrcstimation. For the rejection of this Hq modcl, thc significancc 
bccomcs undcrcstimated if we identify Ax 2 with a x\ distribution. 



When using zero-baryon models for H we find the 
same qualitative results. We find that Ax 2 global is very 
different from a xí variable, and that identifying A% 2 
with a xi distribution is also wrong, leading to either an 



overestimation or an underestimation of the significance 
for the rejection of particular H models. 

The assumption Ax 2 ¡ oba i ~ xi is broken because the 
spaces of model correlation functions (ÍnoBAO,e)ee& and 
(£/3,0)/3eR,0ee are n °t affine. It is easy to see for exam- 
ple that no-BAO correlations are more degenerate than 
BAO correlations with respect to the three parameters. 
So for a given range of parameters, the space of BAO 
correlations is larger than the space of no-BAO correla- 
tions, which tends to increase the values of Ax 2 ¡obaí and 

A% 2 . In this case, spaces of correlations fail to be affinc 
becausc of thcir limited extent, which itself is due to the 
limitcd range of parameters. 

As we saw in section [5] one needs to consider the 
"worst-case" H model to obtain the correct significance. 
For a realization value A^ 2 = x the p-value is given by 



p(x) 



maxP(Ay 2 > x\H ,\ 
eee ~ 



(49) 



We saw in tables [T] and [2] that the significance can 
be either overestimated or underestimated when reject- 
ing particular H models, if we identify A% 2 with a \i 
variable. If it is overestimated for a H model with pa- 
rameter and for A% 2 = x > 0, it means 





x\ 


A X 2 g iobai 


A X 2 


P(X>1.0) 


0.32 (1<t) 


0.81 (0.25<r) 


0.39 (0.85<r) 


P(X>2.25) 


0.13 (1.5<t) 


0.51 (0.65<t) 


0.18(1.35<t) 


P(X>4.0) 


4.5x 10 -2 (2<t) 


0.23 (1.2<r) 


6.8x 10 -2 (1.8cr) 


P(X>6.25) 


1.2x 10~ 2 (2.5<t) 


8.3xl0 -2 (1.75<t) 


2.1x 10" 2 (2.3er) 


P(X>9.0) 


2.7xl0 -3 (3<t) 


1.9x 10 -2 (2.35<t) 


4.3x 10" 3 (2.85<r) 



P(A X 2 >x\H ,6)>p, (x) = 2$(- y/x) 



(50) 



The consequence is that the classical x 2 method over- 
estimates the significance of the BAO detection. Indeed 
when using equation (49 1 for determining thc significance 



NOTES. — p-values and corrcsponding significanccs for difícrcnt distri- 
butions and for the rejection of thc particular T-Lq modcl with Qmh 2 — 
0.1323, a — 1.0. We show the \\ distribution and the AXg iobaí , A^ 2 
distributions. Thc assumption ^-X^iobai ~ Xi * s wrong with a signif- 
icance that is grossly ovcrcstimatcd. The fact that the classical % 2 
mcthod uscs A% 2 instead of ^X^iobai com P ens at e s the overestimation. 
For thc rcjcction of this 'Ho modcl, thcrc is still a small ovcrcstimation 
of the significance if we identify A% 2 with a Xi distribution. 



of the full Hq rejection, we get 

p{x) >p xí (x) = 2®{-y/x) 



(51) 



When considcring varying covariance matrices, the es- 
timatc of the significance by the class ical x 2 method 
cou ld even be more wrong (see section 4.3 and section 
7.51. 



7.4.2. Aí" method 

In this section we test the modified version of the x 2 
method that we proposed in section [5] which we called 
the Al method. We still consider a constant covariance 
matrix, i.e. the hypotheses 

Uo : 3 9 E e S.t. Í ~ N (UoBAOfi, c) 

There are two modifications that wc proposed compared 
to the classical x 2 method. One of the modifications 
consists in replacing A% 2 by Al 



Al = -2 



maxln (C no BAOfi) - maxln (C B AO,e) 



(52) 



We are still in the case of a constant covariance, so the 
two statistics Ax 2 and Al are eq ual. We will only see 
the effect of this change in section 7.5 when we considcr 
varying covariance matrices. 

We also modify the procedure for computing the sig- 
nific anc e, so that we obtain a correct value as in equation 
( 22 ) or ( 49 1 . For a realization value Al = x the p-value 



is given by 



p(x) = maxPÍAZ > x \H ,( 
t?ee 



(53) 
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Let us define a cumulative distribution function corre- 
sponding to this p-value 

F (x) =p(x) = maxP(AZ >x\M ,9) (54) 

So the method requires to precompute Fq in order to 
obtain the p-value p(x) for a given measurement Al = 
x. Here we consider the range of parameters Í2 m /i 2 G 
[0.1023,0.1623] with grid step 0.015, a € [0.8,1.2] with 
grid step 0.05, and b 2 G [4, 9] with grid step 0.25. Thc 
grid is not very fine because the computation time is 
proportional to the square of the grid size, so it increases 
very rapidly. We tested to refine the grid for each pa- 
ramcter and found agreements at a few percents, which 
is enough for our purpose here. 

For each model on thegrid we generate 10.000 real- 



izations with equation (48) as before, and compute the 
statistic Al. The fact fhat there are only 10.000 real- 
izations for each M model does not enable to quantify 
p-values smaller than approximately 10 -4 , i.e. signifi- 
cances higher than 3.85ct. Note that unlike in the x 2 
method, the imprecision here is only computational and 
limited to high significance. In particular when working 

with a data measurement £, the imprecision only occurs 
when the significance is high and the BAO detection is 
already clear. 

We perform this procedure both for the M hypoth- 
esis and for the M\ hypothesis. With M realizations 
we compute the function F , and we get the significance 
obtained with M\ realizations using equation (54|. Since 
M\ is composite, the distribution of Al is not weli-defined 
under M\. For example we cannot speak about the ex- 
pected significance obtained under M\. Here we simply 
consider the expected significance for every M\ model, 
that we average over all models. This is actually equiva- 
lent to the expected significance obtained under M\ when 
adding a constant prior p(9) in the hypothesis. 

We obtain an average significance of 2.11cr with this 
procedure. On the other hand, when estimating the sig- 
nificance by \j Ax 2 -o as in the classical x 2 method, we 
obtain an average of 2.33cr (we take the convention that 
A\ 2 < corresponds to Ocr). Let us see the effect of 
the imprecision at high significance by only considering 
realizations under the limit of 3.85cr. In this case we ob- 
tain an average significance of 1.92cr with the modified 
procedure, and an average of V 'Ax 2 equal to 2.0. Note 
that this does not mean that the x 2 method is better. 
Indeed it uses the same statistic, so this only means that 
the significance is overestimated. 

Finally we repeat the same computations using zero- 
baryon models for M . In this case we expect a larger 
significance because zero-baryon models not only lack the 
BAO feature, but also have a different global shape than 
baryonic models. The expected significance is higher, so 
we use a higher number of realizations equal to 50.000 
for every M model. When using the rigorous procedure 
for estimating the significance, we obtain an average of 

2.34tr. On thc other hand, when using \J~A~x 2 we obtain 
an average of 2.9. Here the large difference is mostly 
due to the imprecision at high significance. Indeed if we 
restrict to significances under the limit of 4.25cr corre- 
sponding to this number of realizations, we obtain an 
average significance of 2cr with our modified procedure, 



and an average of ^Ax 2 equal to 2.21. 

7.5. BAO detection with varying covariance matrix 

We finally consider the general case where the covari- 
ance matrix depends on cosmological parameters 

M O :3 £ O S.t. £ ~ N (^noBAOfi,C no BAOfi) 

M\:39eO s.t. i ~ M (Zbaoj, C B Ao,e) 

Our goal is only to illustrate the effect of a varying 
covariance instead of a constant covariance. So we con- 
sider a simple example where the covariance matrix only 
depends on the amplitude parameter b 



Cno 



BAOfi = CbAO 



b 



C 



(55) 



with b — 2.5 the simulation value. 

We app ly the Al method with the same procedure as 
in section [7.4.2[ except we take into account variations of 
the covariance matrix in the likelihoods andwhen gen- 
erating M , M\ realizations using equation (48). In this 
case, the Al statistic is different from the A~~ statistic, 
and is computed using equations (27), (28), (29), (30), 
and 



This time we obtain an average significance of 1.96cr 
under M\, which is a bit lower than for a constant co- 
variance. When using the A% 2 statistic we obtain an 
average of 1.59c under M\ with our modified procedure. 
This justifies our choice of replacing A% 2 by Al, which 
can be thought as a generalized likelihood ratio (see sec- 
tion[5]). Using the classical x 2 method, we obtain an 
average of \J~A~~ equal to 2.32. In this case the estimate 
given by the classical x 2 method is very far from the cor- 
rect significance of 1.59cr. As we already mentioned in 



section |4.3 the classical x method cannot be used in 
the caseoi a varying covariance matrix. 

We verify that these conclusions are not due to the 
imprecision at high significance of our procedure. When 
considering only realizations under the limit of 3.85cr, our 
modified procedure gives average significances of 1.89cr 
for Al and 1.52cr for A^ 2 , and the average of yj Ax 2 is 
2.20. 

These results show how the BAO detection is depen- 
dent on the tested hypotheses M , M\, and the choice of 
the statistic. 



7.6. Effect of varying covariance matrix on cosmological 
parameters constraints 

Finally let us see the effect of a varying covariance 
matrix on parameter constraints. As we saw in section 

[d we must have a prior p(9) for the posterior p(9 \ |) to 
De well-defined. We consider a constant p(9) so that the 

constraints only come from the measurement £. Then 

the posterior p(9 \ £) is given by the likelihood 



p(9 | Í) oc Cbao,, 



(56) 



Changing the covariance matrix is equivalent to chang- 
ing the likelihood function. Let us see the effect for a 
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given measurement £. We use for the illustration the ex- 
pected correlation function of the lognormal simulations 
£ = ÍBAO.e with 6 = (Ü m h 2 , a, b) = (0.1323, 1.0, 2.5). 

We compute the posterior p(il m h 2 , a | £) after 
marginalizing over the amplitude b 2 with b 2 e [4,9]. We 
plot the results in figure [6] and [7] respectively for a con- 
stant covariance matrix and for a varying covariance ma- 
trix. We also plot two lines of constant apparent horizon 
at matter-radiation equality aíl m h 2 , and constant ap- 
parent sound horizon a (Q m h 2 ) 25 . These would be de- 
generacy lines if we focused respectively on small scales 
and on the BAO scale. As expected the degeneracy di- 
rection for thc constraints lies in bctwccn the two lines. 




0.10 0.12 0.14 0.16 0.18 0.20 

m 



FlG. 6. — Posterior p(íl m h 2 , a | £) in the case of constant co- 
variance matrix, with | = £,BAO,8 an d & = (^mh 2 , a, b) = 
(0.1323, 1.0, 2.5) for the illustration. Wc plot thc lcr to 5cr 
confidence regions with the approximation that p is a 2- 
dimensional Gaussian. They correspond respectivcly to — 21n(p) = 
-21n(p ma ~) + 2.29,6.16,11.81,19.32,28.74 (see sec tion "Confi- 
dence Limits on Estimated Model Parameters" in IPress et al.l 
(2007)). We see deviations to a Gaussian posterior because the 
contours are not totally elliptical and symmctrical. We also plot 
the lines of constant apparent horizon at matter-radiation equality 
afl m h 2 , and constant apparcnt sound horizon a(Q m h 2 ) 5 . As 
expected the degeneracy direction of the constraints lies in between 
the two lines. 

First we notice that the posterior p(íl m h 2 , a | £) is 
much farther to a 2-dimensional Gaussian for a varying 
covariance matrix than for a fixed covariance matrix. 

Constraints on each parameter fl m h 2 and a are ob- 
tained after marginalizing on the other parameter. We 
obtain different constraints in the two cases, with a 
small shift in the maxima of the posteriors. For a con- 
stant covariance we obtain íl m h 2 = 0.134 ± 0.015 and 
a = 0.995 ± 0.070. For a varying covariance we obtain 
ü m h 2 = 0.126 ± 0.014 and a = 0.976 ± 0.070. 

Constraints on individual parameters are dominated 
by the degeneracy direction of the 2-dimensional poste- 
rior. In both cases this direction is approximately as 
poorly constrained. However the orthogonal direction 
is better constrained in the case of varying covariance. 
Overall the 2-dimensional constraints on (Q, m h 2 ,a) are 
better for this particular example of correlation function. 

8. CONCLUSIONS 

We have presented different methods for BAO detec- 
tion, and for each of them we detailed the tested Ho, Hi 
hypotheses and the underlying assumptions. We show in 
table[3] a summary with pros and cons for each method. 




0.10 0.12 0.14 0.16 0.18 0.20 



FlG. 7. — Same as figure[6]but for varying covariance matrices 
Cbao,B = (j^) 4 C- I n this case the posterior p(Q m h 2 , a \ £) is 
much farther from a 2-dimensional Gaussian. The degeneracy 
direction is as poorly constrained as in figure [6] However the or- 
thogonal direction is better constrained. OveralTthe 2-dimensional 
constraints on (Q m h 2 , a) are better for this particular realization. 



A first type of methods is based on wavelet filtering. 
Their main advantage is that they are mildly model- 
dependent, and mainly sensitive to the BAO feature in 
the correlation function. Thus they are only weakly 
affected by modeling errors. The price to pay is that 
they are outperformed by some model-dependent mcth- 
ods when Hq and T~L\ are well modeled. 

Othcr methods are fully model-dependent. They as- 
sume that the measurement £ is Gaussian, which is well 
vcrificd on our simulations. They also forbid a too small 
binning since they can be unstable and use the inverse 
of the covariance matrix C of £. However this does not 
cause too much loss of information. 

Among these methods, the most often used is the clas- 
sical x 2 , based on the x 2 statistic. We found that it 
only gives a rough estimate for the significance of the 
BAO detection, and more precisely an overestimation. 
This comes from the method assumption that spaces of 
model correlation functions are affine, which is not veri- 
fied in practice. As a consequence, the significance of the 
rejection of some particular Hq models is overestimated. 
Since the rejection of the full Hq hypothesis (the BAO 
detection) is based on the "worst-case" Hq model, its sig- 
nificance is also overestimated. Moreover the estimate of 
the significance can become more wrong with hypotheses 
where the covariance matrix is model-dependent. 

We proposed to use the Al method, which is a modi- 
fied version of the classical \ 2 method. We first modify 
the procedure for obtaining the correct significance. Us- 
ing simulations, we found that correct significances are 
indeed lower than estimates of the classical x 2 method. 
The price to pay is that the method becomes much more 
expensive computationally. As a result we cannot use 
as many 'Hq realizations as we want, which causes im- 
precision at high significance. Yet this limitation is only 
computational and restricted to the case of high signifi- 
cance where the BAO detection is already clear. 

The second modification consists in replacing the A\ 2 
statistic by the Al statistic, which coincide for a constant 
covariance matrix but are different for a varying covari- 
ance matrix. We found that the Al statistic gives better 
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results in the case of varying covariance. As we have seen 
with a simple example, taking into these variations can 
affect both the BAO detection and cosmological param- 
eter constraints. 

In the course of our study we also found that no- 
wiggles models are rejected at a lower level t han zero- 
baryon models. I t agrees with the analysis in |Cabré~S¿| 
Gaztañaga (2011), which uses no-wiggles models for Hq 
and ñnds that the BAO peak is rarely detected above 
3cr for current galaxy surveys. This comes from the fact 
that no-wiggles correlation functions only lack the BAO 
peak, whereas zero-baryon correlation functions have a 
global different shape. So there must be a clear distinc- 
tion between testing the existence of the BAO peak, and 
testing the existence of baryons. 

Let us summarize our main conclusions: 



sults are quite different. 

2. We have presented a new method, the Al method, 
which has two main advantages over the classical 
X 2 method. Unlike the latter it provides the cor- 
rect significance, apart from imprecisions at high 
significance. It also provides better results in the 
case of varying covariance matrix. 

We plan to apply the Al method for BAO detection in 
the LRG sample of SDSS DR7. For this we also plan to 
use more realistic hypotheses Hq and Hi, by modeling 
the variations of the covariance matrix C(£). A more 
realistic H\ hypothesis would also give more realistic pa- 
ramctcr constraints. 



1. The choice of the hypotheses Ho and H\ is impor- 
tant since it affccts both the BAO detection and 
cosmological parameter constraints. To be rigor- 
ous one should take into account variations of the 
covariance matrix. It should also be clear whether 
one tests the existence of baryons or only the ex- 
istence of the BAO peak, because the expected re- 
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Hypotheses, pros, and cons of the different BAO detection methods 



Hypotheses 



Pros 



Cons 



Wavelet H : no peak in E[£] 

methods Hi: pcak in E[¿] 

Classical X 2 H : 3 s.t. £ ~ Af (£ noB AO,8,C) 

Hi: B0s.t.£~Af(Z B AO,e,C) 



Al method H : 3 9 s.t. £ ~ Af (£ noB AO,o,C noBA o,e) 
Hi: BOs.t. £~N(£, B AO,e,C B AO,e) 



Mildly model dependent — > 
robust to modeling errors 

Generalized Likelihood ratio 



Generalized Likelihood ratio 
Variations of covariance matrix 

Better results than A\ 2 statistic 
for varying covariance matrix 



Outpcrformed by some model- 
dcpcndcnt mcthods whcn thcrc 
are no modeling errors 

Model-dependent 
Overestimation of signiflcance 
Constant covariance matrix 

Unstable for small binning 
Gaussian hypothesis 

Model-dependent 
Long computation time — > 
imprecise for high signiflcance 

Unstablc for small binning 
Gaussian hypothesis 



NOTES. — The most important points are in bold. We found that the Gaussian hypothesis is well verified in practice, and that using 
large bins is not a serious problcm. A major diffcrcncc is whether methods give correct estimate of the significance. Other important 
differences come from the tested hypothcses Ho and H\ : whether they are based on a full modeling of £, whether they allow variations of 
the covariancc matrix. 



APPENDIX 
BEST-FIT x 2 

We consider a class of binned model correlation functions, = (£e(r¿))i<¿<„, with a /c-dimcnsional parameter 

9 = (9\, . . . , 9^) e 6. We suppose that the estimator £ of the correlation function is Gaussian with covariance matrix 

C and expectation inside the model space (i.e. 3 9q such that £ ~ Af (£,g , C)). We look at the Xg statistic which has 
a dependence on 9 

xí = E [^n)-Un)]c^[i(r,)-Ur,)} (Al) 

l<i,j<n 

Now wc make the important assumption that the space of model correlation function (£ e ) eee is a fc-dimensional 
afhne subspace of R". Then the best-ht Xg value follows a chi-square distribution with a number of degrees of freedom 
equal to n — k, i.e. the measurement dimension minus the parameter space dimension 

minxg ~ xl-k ( A2 ) 

u 

Sincc C is a positivc dcfinitc matrix, we can consider C" 1 / 2 . Let us note X = C" 1 / 2 ^^) and Xg = C" 1 / 2 ^-^), 
so that we can rewrite the Xg statistic as 

X 2 g = \\X-X e r (A3) 

This is the Karhunen-Loéve transform which consists in whitening the measurement vector This means that 
the resulting vector X is a multivariate Gaussian variable with expected value and covariance matrix equal to the 
identity. Indeed the covariance matrix of X is equal to 

E[AÁ T ] = C _1 / 2 E[(^ - üe )(Í - íe^C- 1 ' 2 

= c -l/2 CG -l/2 = j n 

with /„ the n x n identity matrix. Thus, in any orthonormal basis of K n , the n componcnts of X are independent 
standard Gaussian variables. Let us write F@ = (Xg)g^Q, which is a /c-dimensional vectorial space, and Fq its orthog- 
onal complcmcnt of dimcnsion n — k. Let us write (Yi, . . . , Yfc+i, . . . ,Y n ) the componcnts of X into an orthonormal 
basis, which has the first k vectors in Fq and the last n — k vectors in Fq . Then the í¿'s are independent standard 
normal variables. Moreover Xg is minimized when Xg is the projection of X onto Fq, and equals 
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ndnxe = ll^-^ell 2 = ll^ll 2 ( A4 ) 



= £ (A5) 

i=k + l 

This shows that the best-fit Xg follows a chi-square distribution with n — k degrees of freedom, i.e. ming Xe ~ xt-k- 

DIFFERENCE OF BEST-FITS \ 2 IN NESTED MODELS 

Here we consider two nested classes of model correlation functions, £g with 9 G ©i for the first class and 8 £ ©2 for 
the second class. We suppose that Oi is /c-dimensional and that O2 is (k + ¿)-dimensional with Oi c ©2- 

We still suppose that the estimator £ is Gaussian with covariance matrix C and expectation inside the restricted 
class (i.e. 3 o G ©i such that £ ~ Af (£g ,C)). We also keep the assumption that the spaces of model correlation 
functions (£,e)eeei an d (^e)eee 2 are a ffine subspaces of MJ 1 of respective dimensions k and k-\-l. Then the difference of 
best-fits between the two classes Oi and ©2 follows a chi-square distribution with number of degrees of freedom equal 
to l, i.e. the difference in the number of parameters of the two classes. 

mm xl " mm xl ~ xf ( B1 ) 

PtOl Í7fcH 2 

This follows easily from appendix [a| We consider again the Karhunen-Loéve transforms X = C _1 / 2 (£ — £e ) an d 
Xe = C" 1 / 2 ^ — £g ). Let us write the model spaces F&^ = (Xe)eeBi an d Fq 2 = (Xg)g G Q 2 , an d their orthogonal 
complements Fq and Fq . We can write (Y\, . . . , Yk+i, . . . , Yk+1+1, . . . Y n ) the components of X into an orthonormal 
basis, which has the first k vectors in Fq^ C\Fq 2 , the next l components in C\Fq 2 and the last n— (k+l) components 
in Fq n Fq . The components Y¡,'s are independent standard normal variables. Moreover for each class of model, Xg 
is minimized when X§ is the projection of X onto the model space Fq 



n 

2 



minxeHlAV II 2 = Y Y t 

i=k + l 
n 

minxe = ||AV l| 2 = T % 
eee 2 e 2 1 ¿-^ 1 



i=k + l 

2 

i=k+l+l 



So the best-fit difference is given by 



min Xe — mm Xñ 



k+l 

£ Y? (B2) 

i=k+l 

This shows that the difference of best-fits Xe follows a chi-square distribution with l degrees of freedom, i.e. 
min ee ei Xe - mm eee 2 Xe ~ Xi- 

OPTIMALITY OF THE LIKELIHOOD RATIO 
Let us consider the likelihood ratio A(£) = £?¿ (£) / 'L^fé) and another statistic S(£) for testing the hypotheses Hq 

and Ti\. Let us suppose that Hi is preferred over Hq for low values of S(£) as for the likelihood ratio (if this is not 
the case we just consider —S). 
We first consider statistical tests for a given significance of a. The test based on the likelihood ratio is 

• if A(£) < r¡\ thcn acccpt T~Li 

• if A(£) > r¡A then accept Tí 
The test based on the statistic S is 

• if S(£) < r¡s then accept Tíi 

• if S(£) > r¡s then accept % 

with a = P ( A(£) < r¡A | Hoj an d a = P \S(£) < r¡s \ Hoj ■ The Neyman-Pearson lemma states that the power of 

the likelihood ratio test if larger than the power of any other test. This means that the probability of accepting Hi if 
it is true is larger for the likelihood ratio test. 
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P (A(|) < va I Hi) > P (S(i) < | Hi 



(Cl) 



Now we consider the significances corresponding to realization values A(£) = x and S(£) — y that we write respec- 
tively a A and as 



We have «^(77^) = a and as^rys) 



a A (x) = P(A(£)<x\H ) 
a s (y) = P(S(Í)<y\n ) 
a. Since a A and as are increasing functions, the conditions A(£) < r¡ A and 



S(Í) < ?7s are equivalent respectively to a A (/Y(£)^ < a and as \S(£)) < a. If we simplify the notations and write 
a A (£) f° r a A (a(£)) and a s (£) for as (s(£)), we obtain from cquation (Cl) that for any a 



P (oa(|) < a \Hí) > P (a s (£) <a\H^j 



Let us show that this implies 



E 



a A (£)\Hi 



< E 



a s(0\Hi 



(C2) 



(C3) 



In what follows, we always keep the condition H\ in expectations and probabilities, so we omit it to simplify 
the notations. We write F A and Fs the cumulative distribution functions given by F A (a) = P(a A (£) < a) and 
Fs(a) = P(a s (£) < a). Equation (C2| implies for any a and p 

F A (a) > F s (a) (C4) 
F^(p)<FsH P ) (C5) 



The expectation of «a(Ó i s given by 



E[a A (Ó] = JadF A (a)=J^ F^(p)dp 



(C6) 



where we made the change of variable p = F A (a). The same computation can be made for E[a<?(£)], and since 

F^fp) < Fg 1 ^) we get E[oa(£)] — ^[ a s(0]- So the expected significance given as a p-value is minimized for thc 
likclihood ratio. 

If we measure the significance as a number of a instead of a p-value, both inequalities ( C4 ) and ( C5 ) are reversed. 
So the inequality on the expected values is also reversed, and the expected number of a is maximized ior the likelihood 
ratio. 



